360 research outputs found

    A Computational Approach to Reconstructing Gene Regulatory Networks.

    Get PDF
    Motivation: Many modeling frameworks have been applied to infer regulatory networks from gene expression data sets. Linear Additive Models (LAMs), as one large category of models, have been gaining more and more popularity. One problem associated with this kind of models is that the system is often under-determined because of excessive number of unknown parameters. In addition, the practical utility of these models has remained unclear. Methods: Based on LAMs, we developed an improved method to infer gene regulatory networks from time-series gene expression data sets. The method includes an incremental connectivity model with indexed regulatory elements and a linear time complexity fitting algorithm embedded with genetic algorithm. Comparing to previous LAMs, where a fully connected model is used, the new technique reduces the number of parameters by O(N), therefore increasing the chance of recovering the underlying regulatory network. The fitting algorithm increment the connectivity during the fitting process until a satisfactory fit is obtained. Results: We performed a systematic study to explore the data mining availability of LAMs. A guideline to use LAMs is provided: If the system is small (3-20 elements), more than 90% regulation pathways can be correctly determined. For a large scale system, either a clustering is needed or it is necessary to integrate other information besides expression profile only. Coupled with clustering method, we applied our method to Rat Central Nervous System development (CNS) data with 112 genes. We were able to efficiently generate regulatory networks with statistically significant pathways which have been previously predicted

    A Dynamic Bayesian Network Model for Hierarchial Classification and its Application in Predicting Yeast Genes Functions

    Get PDF
    In this paper, we propose a Dynamic Naive Bayesian (DNB) network model for classifying data sets with hierarchical labels. The DNB model is built upon a Naive Bayesian (NB) network, a successful classifier for data with flattened (nonhierarchical) class labels. The problems using flattened class labels for hierarchical classification are addressed in this paper. The DNB has a top-down structure with each level of the class hierarchy modeled as a random variable. We defined augmenting operations to transform class hierarchy into a form that satisfies the probability law. We present algorithms for efficient learning and inference with the DNB model. The learning algorithm can be used to estimate the parameters of the network. The inference algorithm is designed to find the optimal classification path in the class hierarchy. The methods are tested on yeast gene expression data sets, and the classification accuracy with DNB classifier is significantly higher than it is with previous approachesā€“ flattened classification using NB classifier

    Dynamics of asynchronous random Boolean networks with asynchrony generated by stochastic processes

    Get PDF
    An asynchronous Boolean network with N nodes whose states at each time point are determined by certain parent nodes is considered. We make use of the models developed by Matache and Heidel [Matache, M.T., Heidel, J., 2005. Asynchronous random Boolean network model based on elementary cellular automata rule 126. Phys. Rev. E 71, 026232] for a constant number of parents, and Matache [Matache, M.T., 2006. Asynchronous random Boolean network model with variable number of parents based on elementary cellular automata rule 126. IJMPB 20 (8), 897ā€“923] for a varying number of parents. In both these papers the authors consider an asynchronous updating of all nodes, with asynchrony generated by various random distributions. We supplement those results by using various stochastic processes as generators for the number of nodes to be updated at each time point. In this paper we use the following stochastic processes: Poisson process, random walk, birth and death process, Brownian motion, and fractional Brownian motion. We study the dynamics of the model through sensitivity of the orbits to initial values, bifurcation diagrams, and fixed-point analysis. The dynamics of the system show that the number of nodes to be updated at each time point is of great importance, especially for the random walk, the birth and death, and the Brownian motion processes. Small or moderate values for the number of updated nodes generate order, while large values may generate chaos depending on the underlying parameters. The Poisson process generates order. With fractional Brownian motion, as the values of the Hurst parameter increase, the system exhibits order for a wider range of combinations of the underlying parameters

    Message Passing Clustering with Stochastic Merging Based on Kernel Functions

    Get PDF
    In this paper, we propose a new Stochastic Message Passing Clustering (SMPC) algorithm for clustering biological data based on the Message Passing Clustering (MPC) algorithm, which we introduced in earlier work. MPC has shown its advantage when applied to describing parallel and spontaneous biological processes. SMPC, as a generalized version of MPC, extends the clustering algorithm from a deterministic process to a stochastic process, adding three major advantages. First, in deciding the merging cluster pair, the influences of all clusters are quantified by probabilities, estimated by kernel functions based on their relative distances. Second, the proposed algorithm property resolve the ā€œtieā€ problem, which often occurs for integer distances as in the case of protein interaction data. Third, clustering can be undone to improve the clustering performance when the algorithm detects objects which donā€™t have good probabilities inside the cluster and moves them outside. The test results on colon cancer gene-expression data show that SMPC performs better than the deterministic MPC

    Applications of Hidden Markov Models in Microarray Gene Expression Data

    Get PDF
    Hidden Markov models (HMMs) are well developed statistical models to capture hidden information from observable sequential symbols. They were first used in speech recognition in 1970s and have been successfully applied to the analysis of biological sequences since late 1980s as in finding protein secondary structure, CpG islands and families of related DNA or protein sequences [1]. In a HMM, the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. In this chapter, we described two applications using HMMs to predict gene functions in yeast and DNA copy number alternations in human tumor cells, based on gene expression microarray data

    Cross-platform Analysis of Cancer Biomarkers: A Bayesian Network Approach to Incorporating Mass Spectrometry and Microarray Data

    Get PDF
    Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies
    • ā€¦
    corecore